inst.eecs.berkeley.edu/~eecs251b

# EECS251B : Advanced Digital Circuits and Systems

### Lecture 19 – SRAM

#### **Borivoje Nikolić**

#### **NVIDIA Announces New GPU Architecture**

**March 17, 2024.** Blackwell-architecture GPUs pack 208 billion transistors and are manufactured using a custom-built TSMC 4NP process. All Blackwell products feature two reticle-limited dies connected by a 10 terabytes per second (TB/s) chip-to-chip interconnect in a unified single GPU.



https://www.nvidia.com/en-us/data-center/technologies/blackwellarchitecture/



### Announcements

- Project
  - Midterm reports due tomorrow!
  - Preliminary design review after Spring break
- Homework 3 due tomorrow
  - Quiz 3 after Spring break
- Lab 5 posted today



# SRAM



# 6-T SRAM Cell



AXL PR NR NR NR NL PL AXR Long Cell Topology



 $(W/L)_{NL} > (W/L)_{AXL}$ 



# SRAM Cell Design Trends









Cell in 90nm (1µm²) Cell in 32nm (0.171µm<sup>2</sup>)

- Key enabling technology: STI
- Impact: Increased cell density

### SRAM Cell Trends (22nm)



 $0.092 \mu m^2$  cell in 22nm from Intel (IDF'09)

planar



A little analysis by using a ruler:

- Aspect ratio 2.9
- Height ~178nm, Width ~518nm
- Gate ~ 45nm (Lg is smaller for logic)

0.346µm<sup>2</sup> cell in 45nm from Intel (IEDM'07)

EECS251B L19 SRAM

### 22nm SRAM – Discrete Widths

• FinFET cell design



High-Density Cell

Low-Voltage Cell



(PD:PG:PU)



(PD:PG:PU)

E. Karl, ISSCC'12





### 14nm SRAM



- Aspect ratio ~2.5
- Cell area = 0.05um<sup>2</sup>
  - Height = 140nm (2 gate p)
  - Width = 350nm
  - Lg  $\sim$  32nm (longer than for logic)

### 10nm SRAM





- High-Density Cell (HDC) ( 1:1:1 (PU:PG:PD)
- Low-Voltage Cell (LVC)
  1:1:2 (PU:PG:PD)

**LVC** 0.0367 μm<sup>2</sup>



Guo, ISSCC'18



2CPP = 108nm

 $Lg \sim 20 nm$ 



# SRAM: Assist Circuits

# **Basic Ideas**

- Dynamically change voltages
- Negative BL helps with writing
- Lower VDD ( $V_{CELL}$ ) helps with writing
- Higher WL helps with writing, lower hurts
- Lower WL helps with read, higher hurts

• Half-select condition: WL selected for write, but write operation is masked (BLs stay high)



### Impact on performance







Zimmers TGASA2012

### SRAM In Practice

• 7nm AMD Zen2 (Singh, ISSCC'20)





## **SRAM In Practice**

• 7nm AMD Zen2 (Singh, ISSCC'20)





# SRAM Peripheral Circuits

# Peripheral Circuits in SRAM

- Decoders (and pre-decoders)
- Column circuitry: read, write, multiplex, mask
- Write assist techniques
- Read assist techniques
- Redundancy
- BIST
- ECC
- Power management

## Sense-Amp Trigger

- Sense-amp trigger needs to be timed carefully
  - Too early: Incorrect evaluation
  - Too late: Unnecessary timing margin
- Problem: Delay based on inverter chains does not track the delay of the memory cell



### Aside: Delay Lines, Replicas and Time Amplification

- We will encounter it several times in this course
  - Used in a wide range of mixed-signal circuits
- A simple delay line



Time-to-digital converter (TDC)



Start-Stop difference read out as a thermometer-coded binary value

Resolution set by inverter delay

Sub-inverter delays are hard to generate Small  $\alpha$  requires large area



Lee, Abidi, JSSC 4/08

# Sense-Amp Triggering

• Replica bitline



Block decoder

Replica delay tracks better across corners But still mistracks across a wide range of supplies

Amrutur, Horowitz, JSSC 8/98

### **Time Amplification**

• Time amplified through metastability (by using setup time characteristics)





Lee, Abidi, JSSC 4/08

Time amplifier



 $T_{out} > T_{in},$ Adjustable by  $T_{off}$ , C

EECS251B L19 SRAM

### Voltage Scaling: Multiplicative Replica Bitline

#### Conventional replica





*n* replica cells discharging replica BL in parallel to reduce the current/cell variation by  $\sqrt{n}$ 

Threshold for discharge is set accordingly to  $V_{DD} - nV_{os}$ Limits n to ~2-4



### Voltage Scaling: Multiplicative Replica Bitline

Multiplicative replica



- Programmable replica delay
- Multiplicative replica scales the delay, w/o increasing variance correspondingly



Forward path digitizes SAEi to CK delay Backward path multiplies





# Redundancy and ECC

# Redundancy and ECC

- Redundancy
  - Spare columns (or rows)
  - Selected at test via eFuse
  - Possible to dynamically program redundancy

## • ECC

- Error detection/correction codes
- Parity
- SECDED
- DECTED



## Redundancy

• Principle



Columns



Rows



Horiguchi, Itoh, Springer 2011.

McPartland, CICC'00.

EECS251B L19 SRAM

### Redundancy

#### • Effectiveness (Bickford, 2008)



Figure 1: Modeled Yield impact comparison for 65 nm SRAM complier. Vmin cell fail rate used in analysis shown in the left chart is 5.10 sigma. Vmin cell fail rate used in the analysis shown in the right chart is 5.20 sigma. 147 Kbit segment is a standardized array size block segment used for comparison purposes

# Soft Errors

- From packaging and cosmic rays
- Packaging:
  - Lead ore contains Po-210 -> (5 days) -> Bi-210 -> (22.3 years) -> Pb-210
  - Or Po-210 -> (138.4 days) -> Pb-210
  - Need 'old lead'
- Cosmic rays
  - Large particles collide with Earth's atmosphere to produce alpha (and other) particles

# Error Correction

- Parity Single Error Detection (SED)
  - $\mathbf{p} = \mathbf{d}_7 \oplus \mathbf{d}_6 \oplus \mathbf{d}_5 \oplus \mathbf{d}_4 \oplus \mathbf{d}_3 \oplus \mathbf{d}_2 \oplus \mathbf{d}_1 \oplus \mathbf{d}_0$

- Single Error Correction Double Error Detection (SECDED)
  - Hamming codes with additional parity



- Double Error Correction Triple Error Detection (DECTED)
  - BCH codes higher decoding complexity

### **Multi-bit Errors**



Kawahara, ISSCC'07 tutorial

### **Multi-bit Errors**



### **Multi-bit Errors**



### Multi-bit Errors: Interleaving





# **6T SRAM Alternatives**





- Read circuit?
- Interleaving?

- Dual-port read/write capability (register-file-like cells)
- N0, N1 separates read and write
  - No Read SNM constraint
  - Half-selected cells still undergo read
- Stacked transistors reduce leakage

L. Chang, VLSI Circuits 2005

# eDRAM

Barth, ISSCC'07, Wang, IEDM'06

Process cost: Added trench capacitor



strap BOX 100000000000000 aherentre alle alle DT Pre-Charg 1-6 wl0 wl1 1.4 Write '1' Write '0' 1-2 rbl 1.0 Volts <u>d</u>N Ø. 8 strong '1' E Ibl s Ø - 6 node 0.4 0.2 node strong '0' ы -ø. 6.0 Sec. -9 6.B 7.2 7.6 0.0 Time(ns) ... 8.4 0.0 9.2 9.6 Nominal Process, 1v, 85c 1.8, -Charge 1-6 wl0 wl1 1.7 Read '1' Read '0' Ę 1.2 161 lbl Volts 1.6 Ø.8 е.ь) weak '1' node ø.4 weak '0' node ø. z lbl -0.2 10.4 10.0 11.2 12.0 12.4 11.ь 12.0 13.2 13.6 14.0 Time (ns) SETP fires, refreshing '0'

#### Column Switch Charge Share

RBL

A/RI

### **Crosspoint Memories**

• Barrett, IRE Trans. Comp. 1961.



Fig. 2—Memory structure.  $I_1$  and  $I_2$  are access drive currents to core-selection switch. Presence or absence of a magnet over a twistor-strip solenoid crosspoint yields a "zero" or "one." Signals observed between twistor and return wire.

### **Crosspoint Memories**

Amorphous semiconductors: jury still out56Designing low-noise bipolar amplifiers82The big gamble in home video recorders89

A McGraw-Hill Publication September 28, 1970





- Neale, Nelson, Moore, Electronics'70
  - 16 x 16 array (256b) of 'read-mostly memory'



### **Crosspoint Memory**



- Four modes
  - Form

• Set

- Reset
- Read



# 3D Crosspoint Arrays

Kau, IEDM'09

Si-Substrate

(





• Yeh, JSSC'15



### **Crosspoint Arrays**

• Read and sneak currents





Bae, TED 4/17

# Summary

- SRAM periphery
  - Decoders
  - Assist circuits
  - Sense amp timing replicas
- 6-T SRAM alternatives
  - 8-T SRAM
  - eDRAM
  - Crosspoint arrays (e.g. RRAM)



# Next Lecture

- Spring break
- Low power design